SNOW-1458135 Implement DataFrame and Series initialization with lazy Index objects #2137

sfc-gh-vbudati · 2024-08-21T18:23:18Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-1458135
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
Please describe how your code solves the related issue.

Implemented functionality to enable creating Series and DataFrame objects with a lazy Index object as the data, index, and/or columns.
This also covers creating Series and DataFrames with rows/columns that don't exist in the given data.
A special case is when the data is a Series or DataFrame object, the new Series or DataFrame object is creating by filtering the data with provided index and columns.
In case some values in index don't exist in data's index, these values are added as new rows and their corresponding data values are NaN.
In case some values in columns don't exist in data's columns, these values are added as new NaN columns.
I use a right outer join to add the new index values, and create and append the new NaN columns in the logic.

…, add tests for the same

…y-index # Conflicts: # src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py

CHANGELOG.md

tests/integ/modin/index/test_df_series_creation_with_index.py

…ry one join

…y-index

…data is not a Snowpark pandas object

… the constructor tests, rewrite concat tests

…y-index

sfc-gh-vbudati · 2024-08-23T01:16:49Z

All of the join counts in the tests have increased because during DataFrame/Series creation with a non-Snowpark pandas object as data and a Snowpark pandas Index as index, a join is performed instead of converting the index to pandas (which results in an extra query).

In some cases the join count is a lot higher in tests but this is because of the way they are written - some tests call to_pandas() multiple times which results in this.

sfc-gh-azhan

Thanks for doing this! It's a lot of work btw.

Please also check

if you identify some test code can be improved, please add todo and track with jira.
please run a jenkins job to see if anything wrong there before merge.

tests/integ/modin/test_concat.py

…y-index

…y-index # Conflicts: # CHANGELOG.md # src/snowflake/snowpark/modin/plugin/extensions/series_overrides.py

sfc-gh-azhan

dataframe and series constructor are quite similar, can we reuse the code?

sfc-gh-azhan · 2024-09-12T17:24:56Z

tests/integ/modin/groupby/test_groupby_apply.py

@@ -1073,6 +1073,11 @@ class TestSeriesGroupBy:
    @pytest.mark.parametrize("by", ["string_col_1", ["index", "string_col_1"], "index"])
    def test_dataframe_groupby_getitem(self, by, func, dropna, group_keys, sort):
        """Test apply() on a SeriesGroupBy that we get by DataFrameGroupBy.__getitem__"""
+        qc = (
+            6


can we keep using QUERY_COUNT_WITH_TRANSFORM_CHECK and explain why the value change to 5?

done! The query count went down because we're no longer converting the index to pandas in the df constructor

sfc-gh-azhan · 2024-09-12T17:26:46Z

tests/integ/modin/resample/test_resample_fillna.py

    eval_snowpark_pandas_result(
-        *create_test_series({"a": range(len(datecol))}, index=datecol),
+        *create_test_series({"2024-01-01": list(range(len(datecol)))}, index=datecol),


I forgot to check this. I'll do it today.

tests/integ/modin/series/test_iloc.py

sfc-gh-azhan · 2024-09-12T17:29:38Z

tests/integ/modin/series/test_reindex.py

    date_index = native_pd.date_range("1/1/2010", periods=6, freq="D")
    native_series = native_pd.Series(
-        {"prices": [100, 101, np.nan, 100, 89, 88]}, index=date_index
+        {"1/1/2023": [100, 101, np.nan, 100, 89, 88]}, index=date_index


So this is still not able to be fixed?

src/snowflake/snowpark/modin/plugin/extensions/series_overrides.py

src/snowflake/snowpark/modin/pandas/dataframe.py

sfc-gh-azhan · 2024-09-12T18:32:22Z

src/snowflake/snowpark/modin/pandas/dataframe.py


+        else:
+            # CASE 5: Non-Snowpark pandas data


why not just call from_pandas for all cases?
What is case 5.A?

I believe the only special case is
Special case: data is a dictionary where all the values are Snowpark pandas Series

I refactored it all - the two special cases are dict or list with all snowpark pandas elements

src/snowflake/snowpark/modin/pandas/dataframe.py

…y-index # Conflicts: # src/snowflake/snowpark/modin/pandas/dataframe.py # src/snowflake/snowpark/modin/plugin/extensions/index.py

sfc-gh-vbudati · 2024-09-14T01:10:47Z

src/snowflake/snowpark/modin/plugin/_internal/utils.py

@@ -1995,3 +1996,68 @@ def create_frame_with_data_columns(
 def rindex(lst: list, value: int) -> int:
    """Find the last index in the list of item value."""
    return len(lst) - lst[::-1].index(value) - 1
+
+
+def convert_index_to_qc(index: Any) -> Any:


I'm not sure how to make the return type "SnowflakeQueryCompiler" without causing circular import issues

using "SnowflakeQueryCompiler" with quotes

That does not work either - it still causes the issues

sfc-gh-yzou · 2024-09-16T23:38:29Z

src/snowflake/snowpark/modin/plugin/_internal/utils.py

+    if isinstance(index, DataFrame):  # pandas raises the same error
+        raise ValueError("Index data must be 1-dimensional")
+
+    if dtype == "category":


where does this check come from? i don't see it was checked anywhere before

I added this check because it was not checked before - we do not support Categorical type yet. If the user passes in dtype=category, this makes the dtype of the Series/DataFrame category.

The dtype seems only used when data is local, typically under this case, we should already apply the dtype check before uploading the data, we shouldn't need to do such check here. Is not not erroring out today?

No, it's not erroring out over here today - it's because the data itself is not categorical but should be treated like categorical if dtype is category.

sfc-gh-yzou · 2024-09-17T01:02:56Z

src/snowflake/snowpark/modin/plugin/_internal/utils.py

@@ -1995,3 +1996,68 @@ def create_frame_with_data_columns(
 def rindex(lst: list, value: int) -> int:
    """Find the last index in the list of item value."""
    return len(lst) - lst[::-1].index(value) - 1
+
+
+def convert_index_to_qc(index: Any) -> Any:


using "SnowflakeQueryCompiler" with quotes

sfc-gh-yzou · 2024-09-17T01:06:58Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+    # -----------------------------
+    if query_compiler is not None:
+        # CASE I: query_compiler
+        # If a query_compiler is passed in only use the query_compiler field to create a new DataFrame.


in our doc, we actually doc in a way that both can be provided and used like following

Notes DataFrame can be created either from passed data or query_compiler. If both parameters are provided, data source will be prioritized in the next order: Modin DataFrame or Series passed with data parameter. Query compiler from the query_compiler parameter. Various pandas/NumPy/Python data structures passed with data parameter.

please make sure the doc and the behavior is consistent

updated the docs!

sfc-gh-yzou · 2024-09-17T01:07:46Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

@@ -459,104 +464,222 @@ def __init__(
    # TODO: SNOW-1063346: Modin upgrade - modin.pandas.DataFrame functions
    # Siblings are other dataframes that share the same query compiler. We
    # use this list to update inplace when there is a shallow copy.
-    from snowflake.snowpark.modin.pandas.utils import try_convert_index_to_native
+    from snowflake.snowpark.modin.plugin.extensions.index import Index


can the import be at the beginning of file?

sfc-gh-yzou · 2024-09-17T01:11:03Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+
+    elif isinstance(data, Series):
+        # CASE III: data is a Snowpark pandas Series
+        query_compiler = data._query_compiler.copy()


why are we making a copy of the query compiler here? the query compiler is in general immutable

removed the copy logic

sfc-gh-yzou · 2024-09-17T01:20:59Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

-                        axis=1, labels=try_convert_index_to_native(columns)
-                    )
+                    # Reduce the dictionary to only the relevant columns as the keys.
+                    data = {key: value for key, value in data.items() if key in columns}


that function is kind of becoming too long, can you see if you can break this down into different functions like from_query_compiler, from_local_data etc

I moved the special cases to their own functions instead of the local data case

sfc-gh-yzou · 2024-09-17T01:33:57Z

tests/integ/modin/frame/test_assign.py

@@ -17,7 +17,7 @@
 )


-@sql_count_checker(query_count=2, join_count=1)
+@sql_count_checker(query_count=1, join_count=2)


I think we should keep couple of tests with native index objects

sfc-gh-yzou · 2024-09-17T01:49:40Z

tests/integ/modin/test_concat.py

@@ -25,56 +25,56 @@

 @pytest.fixture(scope="function")
 def df1():
-    return pd.DataFrame(
+    return native_pd.DataFrame(


this change seems little bit wired to me, the original code returns snowpark dataframe, but now we return native pandas dataframe here? why is that?

I changed this because with a Snowpark pandas DataFrame being returned the query count is higher and to_pandas() is called a lot. I think it's better to just convert the object to Snowpark pandas whenever needed

sfc-gh-yzou · 2024-09-17T01:55:25Z

tests/integ/modin/groupby/test_groupby_apply.py

        udtf_count=UDTF_COUNT,
-        join_count=JOIN_COUNT,


can you update the JOIN_COUNT, instead of hard code the join cont here?

In most cases the join count is still 1, I modified it to use JOIN_COUNT+1 instead now

sfc-gh-yzou · 2024-09-17T01:56:15Z

src/snowflake/snowpark/modin/plugin/extensions/series_overrides.py

+        # CASE I: query_compiler
+        # If a query_compiler is passed in, only use the query_compiler and name fields to create a new Series.
+        assert (
+            data is None


are those check very similar for series and dataframe, can we unify those?

sfc-gh-azhan

Let's first align with DataFrame init.

sfc-gh-azhan · 2024-09-17T00:45:18Z

src/snowflake/snowpark/modin/plugin/docstrings/series.py

@@ -78,7 +78,7 @@ class Series(BasePandasDataset):
    c    3
    dtype: int64

-    The keys of the dictionary match with the Index values, hence the Index
+    The keys of the dictionary match with the Index values, hence the dictionary


We should keep the original version. Why change this?

Because this description is wrong - belong only the "index" values are used ('x', 'y', 'z') and the dict values ('a', 'b', 'c') are ignored.

This statement is talking about the example above which should be accurate.

sfc-gh-azhan · 2024-09-17T00:50:03Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+    error_checking_for_init(index, dtype)
+
+    # The logic followed here is:
+    # 1. Create a query_compiler from the provided data. If columns are provided, add/select the columns.


it's confusing with the numbers here and the number for the cases. Maybe emphasize this is Step 1, Step 2, etc.

I renamed the steps, hopefully this is easier to read

sfc-gh-azhan · 2024-09-17T00:53:03Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+        if data.name is None:
+            # If no name is provided, the default name is 0.
+            query_compiler = query_compiler.set_columns(columns or [0])
+        if columns is not None and data.name not in columns:


should this be a elif?

maybe do this

if columns is None: # handle all cases here elif data.name in columns: # handle the case here else: # handle data.name not in columns

You didn't handle this case here:
pd.DataFrame(pd.Series([1,2,3], name="b"), columns=["a", "b"])

I refactored it, it should be handled now and I added a test for it

sfc-gh-azhan · 2024-09-17T00:56:17Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+
+    elif isinstance(data, Series):
+        # CASE III: data is a Snowpark pandas Series
+        query_compiler = data._query_compiler.copy()


we don't need to copy qc right?

removed the copy logic

sfc-gh-azhan · 2024-09-17T03:36:30Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+            extra_columns = [col for col in columns if col not in data.columns]
+        else:
+            extra_columns = []
+        query_compiler = data._query_compiler.create_qc_with_extra_columns(


just use loc like df[extra_columns] = None.

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

sfc-gh-azhan · 2024-09-17T03:47:27Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+                if len(data) and all(
+                    isinstance(v, (Index, BasePandasDataset)) for v in data
+                ):
+                    # Special case V.c: data is a list/dict where all the values are Snowpark pandas objects.


cover them in a helper function

Moved special cases to helper functions

sfc-gh-azhan · 2024-09-17T03:49:55Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+    # The logic followed here is:
+    # 1. Create a query_compiler from the provided data. If columns are provided, add/select the columns.
+    # 2. If an index is provided, set the index through set_index or reindex.
+    # 3. If the data is a DataFrame, perform loc to select the required index and columns from the DataFrame.


you should move all columns operation into Step 1. It is confusing to select columns here again.

sfc-gh-azhan · 2024-09-17T03:50:12Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

-            columns=try_convert_index_to_native(columns),
-            dtype=dtype,
-            copy=copy,
+    # 3. If data is a DataFrame, filter result


you should move all columns operation into Step 1. It is confusing to select columns here again.

…y-index

… github.com:snowflakedb/snowpark-python into vbudati/SNOW-1458135-df-series-init-with-lazy-index

sfc-gh-azhan · 2024-09-18T23:07:36Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

-        distributed_frame = from_non_pandas(data, index, columns, dtype)
-        if distributed_frame is not None:
-            self._query_compiler = distributed_frame._query_compiler
+            new_name = data.name


Are you missing these cases?
pd.DataFrame(pd.Index([1,2,3], name = 'b'), columns = ['a']).
pd.DataFrame(pd.Index([1,2,3], name = 'b'), columns = ['a', 'b']).

should be taken care of now

sfc-gh-azhan · 2024-09-18T23:20:03Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+
+    # The logic followed here is:
+    # STEP 1: Create a query_compiler from the provided data. If columns are provided, add/select the columns.
+    # STEP 2: If an index is provided, set the index through set_index or reindex.


This does not match your implementation. Some index is handled in Step 1 now.

Let's implement it following these steps:

handle data

handle columns

handle index

In Step1, we make sure update query_compiler for all lazy data. if query_compiler is None, that means the data is local, and make sure convert data to the right native dataframe in this step.

-- here we can test all cases of pd.DataFrame(data=any, columns=None, index=None)

In Step 2, handle columns based on whether query_compiler is None or not

-- here we can test all cases of pd.DataFrame(data=any, columns=not none, index=None)
In Step 3. similarly handle it based on whether query_compiler is None or not

-- here we can test all cases of pd.DataFrame(data=any, columns=any, index=any)

sfc-gh-vbudati · 2024-09-20T18:32:24Z

tests/integ/modin/index/test_df_series_creation_with_index.py

+        pytest.param(
+            "series",
+            marks=pytest.mark.xfail(
+                reason="SNOW-1675191 reindex does not work with tuple series"


reindex issue https://snowflakecomputing.atlassian.net/browse/SNOW-1675191

sfc-gh-yzou

@sfc-gh-vbudati @sfc-gh-azhan mentioned that the main purpose of this pr is to remove a to_pandas materialization, can we just do that in this pr, and move the other refactoring part out of the current pr?

sfc-gh-yzou · 2024-09-20T18:57:38Z

src/snowflake/snowpark/modin/plugin/extensions/series_overrides.py

-                name = data.name
+    from snowflake.snowpark.modin.plugin.extensions.index import Index
+
+    # Setting the query compiler


One more general comment here about the change, our orignial code behaves in such way that if both data and query compiler are provided, the data is used.
However, here seems we want to change it to a way that only one of them can be configured. i think that is fine, however, please make sure we update the doc to clear this part.

Here is couple of points:

from the structure point of view, i think we can do parameter check first, for example, where both query_compiler and parameter is provided. Then check init the query_compiler like the original code structure, unless there are case works very differently.

the check message doesn't seem very clear. for example, query_compiler and index can not be provided together, might be better to "index is not supported when query_compiler is provided" etc.

I can make the error messages clearer like you pointed out in (2.) --> "index is not supported when query_compiler is provided". But the parameters are right now checked before they are used. I don't think there are any cases in the code where both query compiler and data/index/columns are provided (no tests have failed so far with anything related to this). I think it's also simpler behavior to have it this way.
The doc should also be updated with this behavior.

sfc-gh-yzou · 2024-09-20T18:59:35Z

src/snowflake/snowpark/modin/plugin/_internal/utils.py

+    if isinstance(index, DataFrame):  # pandas raises the same error
+        raise ValueError("Index data must be 1-dimensional")
+
+    if dtype == "category":


The dtype seems only used when data is local, typically under this case, we should already apply the dtype check before uploading the data, we shouldn't need to do such check here. Is not not erroring out today?

sfc-gh-yzou · 2024-09-20T19:06:40Z

src/snowflake/snowpark/modin/plugin/extensions/series_overrides.py

+        if hasattr(data, "name") and data.name is not None:
+            # If data is an object that has a name field, use that as the name of the new Series.
+            name = data.name
+        # If any of the values are Snowpark pandas objects, convert them to native pandas objects.


Under this case, shouldn't we try to convert other ones to snowpark pandas objects instead of pulling them to local? or maybe we should just error it out.

Do you have one example about this case?

One example where its better to convert it to pandas is this:

data = {"A": pd.Series([1, 2, 3]), "B": pd.Index([4, 5, 6]), "C": 5} pd.DataFrame(data) Out[58]: A B C 0 1 4 5 1 2 5 5 2 3 6 5

5 is put in every single row even though it's a scalar in the dict

sfc-gh-vbudati · 2024-09-20T21:36:13Z

@sfc-gh-yzou I prefer not making the refactor changes in a new PR since I think this one is very close to merging and it will take a lot more work to separate the index changes from this

sfc-gh-azhan · 2024-09-20T19:52:45Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+        columns = ensure_index(columns)
+
+    # The logic followed here is:
+    # STEP 1: Obtain the query_compiler from the provided data if the data is lazy. If data is local, the query


Suggested change

# STEP 1: Obtain the query_compiler from the provided data if the data is lazy. If data is local, the query

# STEP 1: Obtain the query_compiler from the provided data if the data is lazy. If data is local, keep the query

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

sfc-gh-azhan · 2024-09-20T20:03:56Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+    # STEP 1: Obtain the query_compiler from the provided data if the data is lazy. If data is local, the query
+    #         compiler is None.
+    # STEP 2: If columns are provided, set the columns if data is lazy.
+    # STEP 3: If both the data and index are local (or index is None), create a query compiler from pandas.


If data is local, create a query compiler from it with local index.

sfc-gh-azhan · 2024-09-20T20:04:15Z

src/snowflake/snowpark/modin/plugin/extensions/dataframe_overrides.py

+    #         compiler is None.
+    # STEP 2: If columns are provided, set the columns if data is lazy.
+    # STEP 3: If both the data and index are local (or index is None), create a query compiler from pandas.
+    # STEP 4: Otherwise, set the index through set_index or reindex.


For lazy index, set the index through set_index or reindex.

sfc-gh-azhan · 2024-09-20T22:00:25Z

@sfc-gh-yzou I prefer not making the refactor changes in a new PR since I think this one is very close to merging and it will take a lot more work to separate the index changes from this

I kind agree with @sfc-gh-yun this PR is becoming too big. Can we use this as the PoC draft PR, and we can review smaller PRs one by one. You can either start with refactoring pieces first or fix the lazy index first. Try to make sure refactoring PR only do refactoring and no test changes.

sfc-gh-vbudati · 2024-09-20T23:52:29Z

@sfc-gh-azhan @sfc-gh-yzou I can try to separate this PR into two other PRs - one for the lazy index change and the other for the refactor. It is impossible to avoid test changes in the refactor PR since I introduced functionality to allow passing non-existent columns or index values to the constructor. The constructors should be able to handle any kind of inputs and I added tests for this.

However, that requires me to make a non-trivial amount of redundant code changes, for example, the same set of tests are changed in both PRs where the query count will likely be different due to the refactor. I was hoping to work on IR tickets from Monday, so I still prefer merging this PR as is, please let me know if you both feel strongly about this.

In the future, I'd really appreciate if the feedback about splitting PRs is brought up earlier.

sfc-gh-vbudati added 2 commits August 21, 2024 11:15

Update Series and DataFrame constructors to handle lazy Index objects…

2094c3f

…, add tests for the same

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

97a7229

…y-index # Conflicts: # src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py

sfc-gh-vbudati requested a review from a team as a code owner August 21, 2024 18:23

sfc-gh-vbudati requested review from sfc-gh-dpetersohn and sfc-gh-joshi August 21, 2024 18:23

github-actions bot added the snowpark-pandas label Aug 21, 2024

update changelog

1979257

sfc-gh-vbudati added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Aug 21, 2024

sfc-gh-vbudati added 3 commits August 21, 2024 11:49

add more tests

5dbb76d

fix minor bug

7de467f

fix isocalendar docstring error

5dd06fd

sfc-gh-azhan reviewed Aug 21, 2024

View reviewed changes

sfc-gh-vbudati added 3 commits August 21, 2024 17:32

truncate tests, update changelog wording, reduce 2 queries to one que…

8b94462

…ry one join

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

c89dc5d

…y-index

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

a2089b8

…y-index

sfc-gh-vbudati requested a review from sfc-gh-azhan August 22, 2024 00:34

sfc-gh-vbudati added 4 commits August 22, 2024 09:32

Get rid of the join performed when only index is an Index object and …

a9376c1

…data is not a Snowpark pandas object

Add back the index join query to DataFrame/Series constructor, update…

420a5ac

… the constructor tests, rewrite concat tests

Update tests

66d634c

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

f277041

…y-index

sfc-gh-azhan reviewed Aug 23, 2024

View reviewed changes

tests/integ/modin/test_concat.py Outdated Show resolved Hide resolved

sfc-gh-vbudati added 8 commits August 23, 2024 14:09

added edge case logic, fix test query count

6a2cb79

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

df96f4a

…y-index

more test fixes

f971b0d

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

13db956

…y-index

fix dict case

8c78f8d

more test case fixes

7970101

correct the logic for series created with dict and index

f3de1c3

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

2447022

…y-index

sfc-gh-vbudati added 4 commits September 11, 2024 14:26

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

c2fb474

…y-index # Conflicts: # CHANGELOG.md # src/snowflake/snowpark/modin/plugin/extensions/series_overrides.py

remove print statements and unnecessary comments

2274d1e

fix tests

9eef8d7

increase coverage

cc09403

sfc-gh-azhan reviewed Sep 12, 2024

View reviewed changes

sfc-gh-vbudati added 3 commits September 13, 2024 17:45

try to move out common logic, add more tests

10c3954

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

64dda24

…y-index # Conflicts: # src/snowflake/snowpark/modin/pandas/dataframe.py # src/snowflake/snowpark/modin/plugin/extensions/index.py

update df init

da56734

sfc-gh-vbudati commented Sep 14, 2024

View reviewed changes

sfc-gh-vbudati added 3 commits September 16, 2024 15:46

moved common logic out, fixed some tests

8b47e17

remove unnecessary diffs

fa4eb09

fix doctest and couple of tests

db28630

sfc-gh-yzou reviewed Sep 17, 2024

View reviewed changes

sfc-gh-azhan reviewed Sep 17, 2024

View reviewed changes

sfc-gh-vbudati added 4 commits September 18, 2024 13:53

apply feedback to simplify logic

17be4c3

Merge branch 'main' into vbudati/SNOW-1458135-df-series-init-with-laz…

301f47f

…y-index

update query counts to use constants

2eb14a7

Merge branch 'vbudati/SNOW-1458135-df-series-init-with-lazy-index' of…

95065f7

… github.com:snowflakedb/snowpark-python into vbudati/SNOW-1458135-df-series-init-with-lazy-index

sfc-gh-vbudati requested review from sfc-gh-azhan and sfc-gh-yzou September 18, 2024 22:03

remove docstring update, add docstrings for helper functions

d9bbd9b

sfc-gh-azhan requested changes Sep 18, 2024

View reviewed changes

try to break down df init into three steps: data, columns, and index

f40c5b4

sfc-gh-vbudati commented Sep 20, 2024

View reviewed changes

merge main into current branch

8cce409

sfc-gh-yzou reviewed Sep 20, 2024

View reviewed changes

sfc-gh-azhan reviewed Sep 20, 2024

View reviewed changes

	# STEP 1: Obtain the query_compiler from the provided data if the data is lazy. If data is local, the query
	# STEP 1: Obtain the query_compiler from the provided data if the data is lazy. If data is local, keep the query

SNOW-1458135 Implement DataFrame and Series initialization with lazy Index objects #2137

Are you sure you want to change the base?

SNOW-1458135 Implement DataFrame and Series initialization with lazy Index objects #2137

Conversation

sfc-gh-vbudati commented Aug 21, 2024

sfc-gh-vbudati commented Aug 23, 2024

sfc-gh-azhan left a comment

Choose a reason for hiding this comment

sfc-gh-azhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-vbudati Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-azhan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-yzou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-vbudati commented Sep 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfc-gh-azhan commented Sep 20, 2024

sfc-gh-vbudati commented Sep 20, 2024

sfc-gh-vbudati Sep 18, 2024 •

edited

Loading